The State of Global Terrorism

An In-Depth Analysis of Trends and Threats

Author

Shreehar Joshi

Terrorism has been a constant hindrance on our effort to achieve global peace and prosperity. From hostage situations and hijackings to mass shootings and bombings, terrorist attacks have a profound impact on both the victims and the larger society; they cause physical harm and loss of life, as well as emotional trauma and psychological distress. Needless to say, they can have long-lasting socioeconomic consequences, disrupting trade and commerce, causing job losses, and decreasing investor confidence.

As the frequency of terrorist attacks is increasing at a rate faster than ever, it is crucial to understand them and their trends and patterns. In this blog post, I will be examining various aspects of terrorism including regions, targets, methods, and motives using three open-source datasets.

The first dataset, Global Terrorism Database (GTD), contains information on over 180,000 global terrorist attacks from 1970 to 2017. Similarly, the two other datasets - World, Region, Country GDP and World Bank National Accounts data, includes the data for Gross Domestic Production (GDP), fertility rate and net migration of different countries in the aforementioned period. All the three datasets were retrieved from the popular data science website Kaggle.

I hope this project will shed some light on the phenomenon of global terrorism and will equip us better to combat them in the future. So let’s roll up our sleeves and demystify the world of global terrorism.

Analysis

Code
# Import modules
import pandas as pd
import numpy as np
import plotly.express as px
import nltk
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn import neighbors
import tensorflow as tf
from PIL import Image
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense, Dropout, Conv1D, MaxPooling1D, Flatten, LSTM, SimpleRNN
from tensorflow.keras.layers import Bidirectional, GRU, UpSampling1D
import plotly.express as px
from sklearn.preprocessing import LabelEncoder
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.lines import Line2D
import matplotlib.patches as mpatches
import time
import warnings
import bar_chart_race as bcr
warnings.filterwarnings("ignore", category=FutureWarning)

# Read first database
df_attacks = pd.read_csv("../data/globalterrorismdb_0718dist.csv", encoding="ISO-8859-1", low_memory=False)
df_attacks.head()
df_attacks = df_attacks[['eventid','iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 
'provstate', 'city', 'latitude', 'longitude', 'suicide', 'attacktype1_txt', 'targtype1_txt', 
'gname', 'motive', 'weaptype1_txt', 'nkill']]
df_attacks.rename(columns={"eventid": "Event ID", "iyear": "Year", "imonth": "Month", 
"country_txt": "Country", "region_txt": "Region", "provstate": "Province/State", "city": "City", "latitude": "Latitude", 
"longitude": "Longitude", "suicide": "Suicide", "attacktype1_txt": "Attack Type",
"targtype1_txt": "Target Type", "gname": "Terrorist Group", "motive": "Motive", 
"weaptype1_txt": "Weapon Type", "nkill": "Casualties"}, inplace=True)

# Read second database
df_population = pd.read_csv("../data/population.csv")
df_population = df_population[["Country","Year", "Migrants(net)", "FertilityRate"]]
df_population.rename(columns= {"FertilityRate": "Fertility Rate", "Migrants(net)": "Migrants (net)"}, inplace=True)

# Read third database
df_gdp = pd.read_csv("../data/world_country_gdp_usd.csv")
df_gdp = df_gdp[['Country Name','year', 'GDP_USD']]
df_gdp.rename(columns= {"Country Name": "Country", "year": "Year", "GDP_USD":"GDP (in USD)", "GDP_per_capita_USD": "GDP (per capita)"}, inplace=True)

# Read database for the population of the US
df_us_population = pd.read_csv("../data/us_population.csv")
df_us_population = df_us_population[["state", "pop2022"]]
df_us_population.rename(columns= {"state": "State", "pop2022": "Population"}, inplace=True) 

# Show the terrorist attacks as a scatter animation
fig = px.scatter_geo(df_attacks, lon="Longitude", lat="Latitude", animation_frame="Year", color="Region",
                     projection="equirectangular", animation_group="Year", title="Terrorist Attacks (1970 - 2017)")
fig.update_layout(title_x=0.44)
fig.show()

Figure 1: Global Terrorist Attacks

The animation above shows that there were a significant number of terrorist attacks in the US from 1970 to 2017. It is surprising to see this, especially when we consider the effort the US has made over the past 50 years in tackling terrorism in almost every terrorist-prone country.

Which states had the highest number of terrorist attacks? Lets find out.

Terrorism in the US

Code
# Map US states to their abbreviations
us_state_to_abbrev = {
    "Alabama": "AL",
    "Alaska": "AK",
    "Arizona": "AZ",
    "Arkansas": "AR",
    "California": "CA",
    "Colorado": "CO",
    "Connecticut": "CT",
    "Delaware": "DE",
    "Florida": "FL",
    "Georgia": "GA",
    "Hawaii": "HI",
    "Idaho": "ID",
    "Illinois": "IL",
    "Indiana": "IN",
    "Iowa": "IA",
    "Kansas": "KS",
    "Kentucky": "KY",
    "Louisiana": "LA",
    "Maine": "ME",
    "Maryland": "MD",
    "Massachusetts": "MA",
    "Michigan": "MI",
    "Minnesota": "MN",
    "Mississippi": "MS",
    "Missouri": "MO",
    "Montana": "MT",
    "Nebraska": "NE",
    "Nevada": "NV",
    "New Hampshire": "NH",
    "New Jersey": "NJ",
    "New Mexico": "NM",
    "New York": "NY",
    "North Carolina": "NC",
    "North Dakota": "ND",
    "Ohio": "OH",
    "Oklahoma": "OK",
    "Oregon": "OR",
    "Pennsylvania": "PA",
    "Rhode Island": "RI",
    "South Carolina": "SC",
    "South Dakota": "SD",
    "Tennessee": "TN",
    "Texas": "TX",
    "Utah": "UT",
    "Vermont": "VT",
    "Virginia": "VA",
    "Washington": "WA",
    "West Virginia": "WV",
    "Wisconsin": "WI",
    "Wyoming": "WY",
    "District of Columbia": "DC",
    "American Samoa": "AS",
    "Guam": "GU",
    "Northern Mariana Islands": "MP",
    "Puerto Rico": "PR",
    "United States Minor Outlying Islands": "UM",
    "U.S. Virgin Islands": "VI",
}

# Filter all the attacks in the US alone
df_attacks_us = df_attacks[df_attacks["Country"] == "United States"] 
df_attacks_us = pd.DataFrame(df_attacks_us.groupby("Province/State")["Event ID"].count())
df_attacks_us = df_attacks_us.reset_index()
df_attacks_us.rename(columns={"Province/State": "State", "Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_attacks_us = df_attacks_us[df_attacks_us["State"] != "Unknown"]
df_attacks_us["State Code"] = df_attacks_us["State"].apply(lambda x: us_state_to_abbrev[x])

# Standardize the terrorism score between 0 and 1
def scale_column(df, column, minVal=float('-inf'), maxVal=float('inf')):
    if minVal == float('-inf'):
        minVal = min(df[column])
    if maxVal == float('inf'):
        maxVal = max(df[column])
    res = []
    for val in df[column]:
        res.append((val - minVal) / (maxVal - minVal))
    return res

# Count the number of terrorist attacks in each US state and standardize the number based on population
df_attacks_us = df_attacks_us.merge(df_us_population[['State', 'Population']])
df_attacks_us["Number of Terrorist Attacks (Standardised)"] = df_attacks_us["Number of Terrorist Attacks"] / df_attacks_us["Population"]
tempVal = scale_column(df_attacks_us, "Number of Terrorist Attacks (Standardised)")
df_attacks_us["Number of Terrorist Attacks (Standardised)"] = tempVal
df_attacks_us = df_attacks_us.sort_values(by="Number of Terrorist Attacks (Standardised)", ascending=False)

# Plot the choropleth for terrorist attacks in the US states
fig = px.choropleth(df_attacks_us, locations='State Code', color='Number of Terrorist Attacks (Standardised)',
                    color_continuous_scale="Viridis",
                    locationmode="USA-states", 
                    scope="usa",
                    labels={'Number of Terrorist Attacks (Standardised)':'No. of Attacks (Standardised)'},
                    title="Terrorist Attacks in the US (1970-2017)")
fig.update_layout(title_x=0.44)
fig.update_layout( legend = {"xanchor": "right", "x": -0, "y":1.9})
fig.update_layout(height=500, width=780)
fig.show()

Figure 2: Terrorist Attacks in the US

Figure 2 shows different states in the US with a varying number of terrorist attacks, which has been calculated by dividing the total number of terrorist attacks in a given state by its population and standardizing in such a way that the state with the highest score is assigned a value of 1 and the least score is assigned a value of 0. We see that New York, Oregon, California, Washington, and Nebraska are the five most terrorist-prone states in the US and Kentucky, South Carolina, West Virginia, Alaska, and Arkansas are the safest states in terms of the frequency of terrorist attacks.

So what exactly motivates these terrorist groups and has it changed over the last fifty years?

Code
# Create two word clouds showing the motives of the terrorist attacks.
# First, download the stopwords and add common words from the motives column
stpwrd = nltk.corpus.stopwords.words('english')
extended_list = ["specific",  "motive", "unknown", "Unknown", "incident", "claimed", "responsibility", "however", "unaffiliated", "individual", "identified", "killed", "stated", "anti", "attacks", "protest", "carried", "attack", "trend", "larger", "may", "part", "following", "community", "sources", "violence", "targeting", "noted", "posited", "suspected", "targeting", "members", "noted", "targeted", "also", "assailant", "perpetrator", "meant", "bring attention", "practice", "perpetrator", "assailant", "meant", "bring", "attention"]
stpwrd.extend(extended_list)

# Select all the attacks in the US
df_attacks_us = df_attacks[df_attacks["Country"] == "United States"]
df_attacks_us = df_attacks_us[["Year", "Motive"]]
df_attacks_us = df_attacks_us.dropna()

# Select the subset of the dataset above to only include the years between 1970 and 1999 inclusive.
temp_df = df_attacks_us[(df_attacks_us["Year"] >= 1970) & (df_attacks_us["Year"] < (2000))]
motive = list(temp_df["Motive"].values)
motive = " ".join(motive)

# Plot the word cloud
wordcloud = WordCloud(width=1000, height=800,
                background_color ='white',
                stopwords=stpwrd,
                color_func=lambda *args, **kwargs: "green",
                min_font_size = 10).generate(motive)
plt.figure(figsize = (12, 12), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off")
plt.tight_layout(pad = 2)
plt.title("Attack Motives (" + str(1970) + " - " + str(1999) + ")", fontdict={'fontsize': 36})
plt.show()


# Select the subset of the dataset above to only include the years between 2000 and 2017 inclusive. 
temp_df = df_attacks_us[(df_attacks_us["Year"] >= 2000) & (df_attacks_us["Year"] <= (2017))]
motive = list(temp_df["Motive"].values)
motive = " ".join(motive)

# Plot the word cloud
wordcloud = WordCloud(width=1000, height=800,
                background_color ='white', 
                stopwords=stpwrd,
                color_func=lambda *args, **kwargs: "purple",
                min_font_size = 10).generate(motive)
plt.figure(figsize = (12, 12), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off")
plt.tight_layout(pad = 2)
plt.title("Attack Motives (" + str(2000) + " - " + str(2017) + ")", fontdict={'fontsize': 36})
plt.show()

(a) 1970-1999

(b) 2000-2017

Figure 3: Attack Motives in the US

Both the word clouds share a common theme of abortion, suggesting that this has been a prominent topic of discussion and conflict for several decades in the US. However, the wordclouds also differ in significant ways. The first wordcloud, which pertains to the pre-2000 period, reveals issues that were relevant to Puerto Rico, Vietnam, and African American groups. The second wordcloud, which represents the post-2000 period, shows themes that are related to Iraq, ISIL, and Islamic states. These topics tend to indicate that there has been an increase in attacks associated with religion over the past 20 years in the US. This shift in topics aptly reflects the changes in the political landscape both domestically and internationally - from fighting against the spread of communism and racism to battling religiously motivated terrorism.

Global Terrorism

Now that we have analyzed the state of terrorism in the US, how about we move to get its bigger picture? Let’s begin by analyzing how the frequency of terrorist attacks has changed over the last 50 years.

Code
# Count the number of terrorist attacks each year and plot a bar chart.
yearly_freq = pd.DataFrame(df_attacks.groupby("Year")["Event ID"].count()).reset_index()
yearly_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
fig = px.bar(yearly_freq, x=yearly_freq["Year"], y=yearly_freq["Number of Terrorist Attacks"], title="Frequency of Terrorist Attacks (1970-2017)")
fig.update_layout(title_x=0.5)
fig.update_layout(height=400)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 4: Frequency of Terrorist Attacks

It is clear from Figure 4 that the number of terrorist attacks was at its minimum around the years 1972 and 2003 (it is worth mentioning that the data for 1994 was missing and not 0) and has greatly increased over the last 10 years in the dataset (2007-2017).

So, what parts of the world have experienced the highest number of terrorist attacks?

Code
# Count the number of terrorist attacks in each geographical regions and group them based on target type.
region_freq = pd.DataFrame(df_attacks.groupby(["Region", "Attack Type"])["Event ID"].count()).reset_index()
region_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
region_freq = region_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)
region_freq['Attack Type'] = region_freq['Attack Type'].replace(['Bombing/Explosion', 'Hostage Taking (Kidnapping)', 'Facility/Infrastructure Attack', 'Hostage Taking (Barricade Incident)'], ['Bombing', 'Hostage', 'Facility Attack', 'Hostage (Barr.)'])

# Plot the bar chart.
fig = px.bar(region_freq, x=region_freq["Region"], y=region_freq["Number of Terrorist Attacks"], color="Attack Type", height=400, title="Terrorist Attacks in Different Regions", barmode="relative")
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 5: Terrorist Attacks in different Regions

Figure 5 shows that the Middle East & North Africa, South Asia, and South America were the three most terrorist-prone regions. On the other hand, Australasia & Oceania, Central Asia, and East Asia were the safest regions in terms of terrorism. It is also worth noting that in all the geographical regions, the terrorist groups used bombing and armed assault as the most common form of attacks.

Let’s delve deeper to see which countries from these terrorist-prone regions were contributing the highest number of terrorist incidents.

Code
# Count the number of casualties in each country.
df_countries_casualties = pd.DataFrame(df_attacks.groupby(["Country"])["Casualties"].sum().reset_index())
df_countries_terrorist_count = pd.DataFrame(df_attacks.groupby(["Country"])["Event ID"].count().reset_index())
df_countries_terrorist_count.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_merged_casualties_count = df_countries_casualties.merge(df_countries_terrorist_count[["Country", "Number of Terrorist Attacks"]])

# Map the country names to their ISO codes
df_iso_codes = px.data.gapminder()[["country", "iso_alpha"]]
df_iso_codes.rename(columns={"country": "Country", "iso_alpha": "Country Code"}, inplace=True)
df_iso_codes.drop_duplicates(inplace=True)
df_iso_codes = df_iso_codes.reset_index()
df_iso_codes.drop(["index"], axis=1, inplace=True)
df_countries_terrorist_count = df_countries_terrorist_count.merge(df_iso_codes[['Country', 'Country Code']])
df_countries_terrorist_count["No. of Attacks"] = df_countries_terrorist_count["Number of Terrorist Attacks"]

# Plot a choropleth representing the frequency of attacks in different countries.
fig = px.choropleth(df_countries_terrorist_count, locations="Country Code",
                    color="No. of Attacks",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title="Terrorist Attacks (1970 - 2017)")
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.show()

Figure 6: Countries with the Highest Number of Attacks

Figure 6 shows that Iraq in the Middle East; Afghanistan, Pakistan, and India in South Asia, and Colombia in South America were the most terrorist-prone countries.

The analysis of global terrorism is incomplete without information on terrorist groups. How about we visualize the top 15 most notorious terrorist groups based on the number of casualties from their attacks?

Code
# Count the number of casulaties for different terrorist groups
groupwise_casualty_freq = pd.DataFrame(df_attacks.groupby("Terrorist Group")["Casualties"].sum()).reset_index()
groupwise_casualty_freq = groupwise_casualty_freq.sort_values(by="Casualties", ascending=False)[:16]
notorious_groups = list(groupwise_casualty_freq["Terrorist Group"])
notorious_groups.remove("Unknown")
df_notorious_groups = df_attacks[df_attacks["Terrorist Group"].isin(notorious_groups)]
df_notorious_groups = pd.DataFrame(df_notorious_groups.groupby(["Terrorist Group", "Year"])["Casualties"].sum().reset_index())
df_notorious_groups["Terrorist Group"] = df_notorious_groups["Terrorist Group"].replace(["Farabundo Marti National Liberation Front (FMLN)", "Islamic State of Iraq and the Levant (ISIL)", "Kurdistan Workers' Party (PKK)", "Liberation Tigers of Tamil Eelam (LTTE)", "New People's Army (NPA)", "Nicaraguan Democratic Force (FDN)", "Revolutionary Armed Forces of Colombia (FARC)", "Shining Path (SL)", "Tehrik-i-Taliban Pakistan (TTP)"], ["Farbundo Liberation", "ISIL", "Kurdistan W.", "Tamil Tigers", "New People's Army", "Nicaraguan Force", "Colombian Force", "Shining Path", "Taliban Pakistan"])

# Plot a line chart for the 15 terrorist groups with the highest number of casualties
fig = px.line(df_notorious_groups, x="Year", y="Casualties", color="Terrorist Group", title='Attacks by different Terrorist Groups')
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 7: Attacks by Different Terrorist Groups

One cannot fail to notice the peak in 2001 for Al Qaida’s suicide terrorist attack against the United States (widely known as the 9/11 attack), which is widely taken as the beginning of the rise of other Islamic religious extremist terrorist groups. Another interesting finding: Taliban, Boko Haram, and ISIL, as we can see from the steep lines after 2010 in Figure 7, appear to have killed more people than all the other 12 terrorist groups combined in the last 50 years.

So what exactly do these terrorist groups target? Let’s find out.

Code
# Select the most common targets of the terrorist groups.
TOP_N = 11
target_freq = pd.DataFrame(df_attacks.groupby("Target Type")["Event ID"].count()).reset_index()
target_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
rem_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[TOP_N:]
target_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:TOP_N]
target_freq = target_freq[target_freq['Target Type'] != "Unknown"]

# Plot a bar chart.
fig = px.bar(target_freq, x='Target Type', y='Number of Terrorist Attacks', title="Common Targets of Terrorist Attacks")
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 8: Common Targets of Terrorist attacks

Most of the attacks have been targeted toward private citizens & property, the military, and the police. Private citizens & properties are generally the easiest groups to be attacked and also the group with highest population. This might be one possible explanation for such a high number of attacks on them.

Socioeconomic Aspects

Now, let’s change our direction a little bit. We will analyze how terrorism is related to different socioeconomic factors like GDP and fertility rate.

Code
# Maps a country to its geographical region
def map_region(country):
    region = list(df_attacks[df_attacks["Country"] == country]["Region"])[0]
    return region

# Find the top 5 countries with the highest number of terrorist attacks
country_freq = pd.DataFrame(df_attacks.groupby("Country")["Event ID"].count()).reset_index()
country_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
country_freq = country_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:10]
country_freq["Region"] = country_freq["Country"].apply(map_region)
top_five_countries = list(country_freq["Country"].values)[:5]

# Count the yearly frequency of terrorist attacks of the top five countries.
country_freq_year = pd.DataFrame(df_attacks.groupby(["Year", "Country"])["Event ID"].count().reset_index())
country_freq_year = country_freq_year[country_freq_year["Country"].isin(top_five_countries)]
country_freq_year.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)

# Select only the attacks from the top five countries.
df_terrorist_gdp = df_gdp[(df_gdp["Country"].isin(top_five_countries)) & ((df_gdp["Year"] >= 1970) & (df_gdp["Year"] <= 2017))]
df_all_gdp = df_gdp[((df_gdp["Year"] >= 1970) & (df_gdp["Year"] <= 2017))]
df_all_gdp = df_all_gdp.dropna()
df_all_gdp = pd.DataFrame(df_all_gdp.groupby("Year").mean().reset_index())
df_all_gdp.rename(columns={"GDP (in USD)": "World"}, inplace=True)

# Assign a specific color to each of the countries. World will take the black color.
colorList = list(px.colors.qualitative.T10)
if colorList[0] != "black":
    colorList.insert(0, "black")
for country in top_five_countries:
    temp_gdp = df_terrorist_gdp[df_terrorist_gdp["Country"] == country]
    df_all_gdp[country] = list(temp_gdp["GDP (in USD)"])

# Plot a line chart showing the GDP of the top five countries and the world.
fig = px.line(df_all_gdp, x='Year', y=df_all_gdp.columns[1:], title="GDP of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={
                     "value": "GDP (in USD)",
                     "variable": ""
                 })
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.update_layout(title_x=0.5)
fig.update_layout(height=400, width=800)
fig.show()

# Select the fertility rate of the top five countries. 
df_all_fertility = df_population[(df_population["Year"] >= 1970) & (df_population["Year"] <= 2017)]
df_terrorist_fertility = df_population[(df_population["Country"].isin(top_five_countries)) & ((df_population["Year"] >= 1970) & (df_population["Year"] <= 2017))]
df_all_fertility = df_all_fertility.dropna()
df_all_fertility = df_all_fertility.drop(['Migrants (net)'], axis=1)
df_all_fertility = pd.DataFrame(df_all_fertility.groupby("Year").mean().reset_index())
df_all_fertility.rename(columns={"Fertility Rate": "World"}, inplace=True)
for country in top_five_countries:
    temp_fertility = df_terrorist_fertility[df_terrorist_fertility["Country"] == country]
    df_all_fertility[country] = list(temp_fertility["Fertility Rate"])

# Plot a line chart showing the fertility rate of the top five countries along with the same for the world.
fig = px.line(df_all_fertility, x='Year', y=df_all_fertility.columns[1:], title="Fertility Rate of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={
                     "value": "Fertility Rate",
                     "variable": ""
                 })
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.update_layout(title_x=0.5)
fig.update_layout(height=400, width=800)
fig.show()

(a) GDP

(b) Fertility Rate

Figure 9: Socio-economic Aspects of Terrorist-prone Countries

Figure 9 shows the GDP and fertility rate of the aforementioned five-most terrorist-prone countries. We can clearly see from the graphs above that all these countries generally have a lower GDP and higher fertility rate compared to the global average in the given period. India is an exception, having its GDP increase at a faster rate than the global average. Similarly, Colombia is an exception, having its fertility rate below the global average right from the mid-1980s.

Machine and Deep Learning

Figure 10: A Simple Feed Forward Neural Network

Code
# Remove the columns we will not be using for the modeling.
try:
    del df_attacks["Event ID"]
    del df_attacks["Motive"]
    del df_attacks["Latitude"]
    del df_attacks["Longitude"]
except:
    print("Some of the columns are not present")
df_attacks = df_attacks.dropna()
df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']] = df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']].apply(LabelEncoder().fit_transform)

# Split into predictor and response variables.
y = df_attacks["Casualties"]
X = df_attacks.drop(['Casualties'], axis=1)

# Split the data into train (70%), validation (15%), and test (15%) sets
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.20, random_state=42)

# Scale the dataset.
scaler = RobustScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
X_val = scaler.fit_transform(X_val)


# Neural Networks
def create_bilstm():
    model = Sequential()
    model.add(Bidirectional(LSTM(128, activation='relu', input_shape=(12,1), return_sequences=True)))
    model.add(Dropout(0.2))
    model.add(Bidirectional(LSTM(64, activation='relu')))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1))
    return model

def create_ffnn():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(12,)))
    model.add(Dropout(0.3))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='sigmoid'))
    model.add(Dense(16, activation='tanh'))
    model.add(Dense(1))
    return model

def create_cnn():
    model = Sequential()
    model.add(Conv1D(32, 3, activation='relu', input_shape=(12,1)))
    model.add(MaxPooling1D(2))
    model.add(Conv1D(64, 3, activation='relu'))
    model.add(MaxPooling1D(2))
    model.add(Flatten())
    model.add(Dense(64, activation='relu'))
    model.add(Dense(1))
    return model

def create_gru():
    model = Sequential()
    model.add(GRU(64, activation='tanh', input_shape=(12,1)))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='tanh'))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    return model

# Result container
result = []
dlModels = {"Feed Forward NN": create_ffnn(), "CNN": create_cnn(), "GRU": create_gru(), "Bi-LSTM": create_bilstm()}


# Reshape the train, test and validation sets.
X_train_new = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_val_new = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)
X_test_new = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# Train the neural networks one at a time.
for name, model in dlModels.items():
    start_time = time.time()
    model.compile(optimizer='adam', loss='mse')
    if name == "Bi-LSTM":
        model.fit(X_train_new, y_train, epochs=20, batch_size=128, validation_data=(X_val_new, y_val))
        y_pred = model.predict(X_test_new)
    else:
        model.fit(X_train, y_train, epochs=20, batch_size=128, validation_data=(X_val, y_val))
        y_pred = model.predict(X_test)
    result.append([name, round(np.sqrt(mean_squared_error(y_test, y_pred)), 2), round(time.time() - start_time, 2)])

# Train the machine learning models one at a time.
mlModels = {"Random Forest": RandomForestRegressor(), "K Neighbors": neighbors.KNeighborsRegressor(), "Decision Trees": DecisionTreeRegressor()}
for name, model in mlModels.items():
    start_time = time.time()
    model.fit(X_train, y_train)
    pred = model.predict(X_test)
    result.append([name, round(np.sqrt(mean_squared_error(y_test, pred)), 2), round(time.time() - start_time, 2)])

# Save the results in a csv file.
pd.options.display.float_format = '{:.2f}'.format
result_df = pd.DataFrame(result, columns=["Model", "Root Mean Squared Error", "Time (in seconds)"])
result_df.to_csv("./results.csv")  

Let’s take the machine and deep learning algorithms out of our arsenals and tackle the problem of predicting the number of casualties for any given attack based on the date, country, region, state, city, suicidal intent, type, target type, terrorist group, and weapon used in the attack. Such a model can prove invaluable for intelligence groups to assess the severity of potential attacks and prepare for them in the future.

The dataset is split into the train, validation, and test sets in the ratio 70:15:15. The train and validation sets are used during the training phase, and the test set is used for assessing the efficiency of the models based on the time they take and their root-mean-squared (RMS) error.

We use four different deep learning models (Feed Forward Neural Network, Bi-directional Long Short-Term Memory, Convolutional Neural Network, and Gated Recurrent Unit) and three other machine learning models (Random Forest, K-Nearest Neighbors, and Decision Trees) for this problem. Each of the neural networks has between five to seven layers, which are chosen based on their efficiency on the validation set.

The results are shown in Figure 11.

Code
# Read the results from the csv.
result_df = pd.read_csv("../results/results.csv")
result_df = result_df.sort_values(by=['Root Mean Squared Error'])

# Plot the data.
matplotlib.rc_file_defaults()
ax1 = sns.set_style(style=None, rc=None)
fig, ax1 = plt.subplots(figsize=(12,6))
colors = ["#5D3FD3", "#5D3FD3", "#5D3FD3","#5D3FD3", "#0096FF", "#0096FF", "#0096FF"]

# Plot the bar chart and set figure options.
sns.barplot(data = result_df, x='Model', y='Root Mean Squared Error', alpha=0.5, ax=ax1, palette=colors)
ax1.set_xticklabels(ax1.get_xticklabels(), fontsize=12)
ax1.set_xlabel("Models", fontsize=14)
ax1.set_ylabel("Root Mean Squared Error", fontsize=14)
ax1.set_title("Efficiency of Models", fontsize=16)

# Plot the lineplot on the same chart and change the alpha level of the charts.
ax2 = ax1.twinx()
ax2.set_ylabel("Time (in seconds)", fontsize=14)
dl = mpatches.Patch(color="#5D3FD3")
ml = mpatches.Patch(color="#0096FF")
custom_line = [Line2D([0], [0], color='#0096FF', lw=2), dl, ml]
leg = plt.legend(custom_line, ["Time", "DL Models", "ML Models"], loc="upper left")
for index, lh in enumerate(leg.legendHandles): 
    if index > 0:
        lh.set_alpha(0.5)
sns.lineplot(data = list(result_df["Time (in seconds)"]), marker='o', ax=ax2, color='#0096FF')
plt.show()

Figure 11: Efficiency of Models

Feed Forward Neural Network turns out to be the most effective model, achieving an RMS error of 8.68, and Decision Trees is the fastest model, completing prediction in 0.99 seconds. In general, neural networks have a lower RMS error than other machine learning models but they are also slower to train and test than their machine learning counterparts. The lowest RMS error we got was 8.68, which is way higher than the average number of casualties at 2.40. That’s a pretty big difference, so it’s safe to say we still have some work to do before these models are really usable.

Our analysis ends here but in the future, we will explore more variables in the terrorism database along with different other socioeconomic factors and their relationship with terrorist attacks. We will also perform extensive hyperparameter tuning and train more sophisticated models like different variants of FractalNets, ResNets, and XceptionNets on a larger dataset, combining and feature engineering different socioeconomic factors to achieve the lowest possible RMS error score.

More Animations

And before you go, here’s a little treat for your eyes.

Code
# Code for bar chart race for countries with highest number of terrorist attacks.
# Yearly data is found for each of the countries.
df_countries_pivot = pd.DataFrame(df_attacks.groupby(["Country", "Year"]).count()).reset_index()
df_countries_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_countries_pivot = df_countries_pivot.pivot_table(values = 'Number of Terrorist Attacks',index = ['Year'], columns = 'Country')
df_countries_pivot.fillna(0, inplace=True)
df_countries_pivot.sort_values(list(df_countries_pivot.columns),inplace=True)
df_countries_pivot = df_countries_pivot.sort_index()
df_countries_pivot.iloc[:, 0:-1] = df_countries_pivot.iloc[:, 0:-1].cumsum()
bcr.bar_chart_race(df = df_countries_pivot,
                   n_bars = 10,
                   period_length=1000,
                   sort='desc',
                   title="Countries with the Highest Number of Terrorist Attacks",
                   filter_column_colors=True,
                   filename = None)

# Code for bar chart race for terrorist attacks based on geographical regions.
df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()
df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_region_pivot = df_region_pivot.pivot_table(values = 'Number of Terrorist Attacks',index = ['Year'], columns = 'Region')
df_region_pivot.fillna(0, inplace=True)
df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)
df_region_pivot = df_region_pivot.sort_index()
df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()
bcr.bar_chart_race(df = df_region_pivot, 
                   n_bars = 12,
                   period_length=1000,
                   sort='desc',
                   title="Terrorist Attacks Based on Geographical Regions",
                   filter_column_colors=True,
                   filename = None)

# Code for bar chart race for terrorist groups with the highest number of attacks.
df_animation_pivot = df_notorious_groups.pivot_table(values = 'Casualties',index = ['Year'], columns = 'Terrorist Group')
df_animation_pivot.fillna(0, inplace=True)
df_animation_pivot.sort_values(list(df_animation_pivot.columns),inplace=True)
df_animation_pivot = df_animation_pivot.sort_index()
df_animation_pivot = df_animation_pivot.drop(columns=["Unknown"])
df_animation_pivot.iloc[:, 0:-1] = df_animation_pivot.iloc[:, 0:-1].cumsum()    
bcr.bar_chart_race(df = df_animation_pivot, 
                   n_bars = 10, 
                   period_length=1000,
                   sort='desc',
                   title="Terrorist Groups with the Highest Number of Attacks",
                   filename = None)

References

Countries in the world by population (2023). Worldometer. Retrieved February 5,
    2023, from https://www.worldometers.info/world-population/population-by-country/

Information on more than 200,000 terrorist attacks. Global Terrorism Database.
   Retrieved February 5, 2023, from https://www.start.umd.edu/gtd/

Lai, N. T. C. (2023, February 3). Word population (1955-2020). Kaggle. Retrieved February
   5, 2023, from https://www.kaggle.com/datasets/nguyenthicamlai/population-2022

Mishinev, T. (2022, September 9). World, region, country GDP/GDP per capita. Kaggle.
   Retrieved February 5, 2023, from
   https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021

National Consortium for the Study of Terrorism and Responses to Terrorism. Global
   terrorism database. Kaggle. Retrieved February 5, 2023, from
   https://www.kaggle.com/datasets/START-UMD/gtd

World Bank. GDP (current US$). GDP National Accounts. Retrieved February 5, 2023, from
   https://data.worldbank.org/indicator/NY.GDP.MKTP.CD